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lent to a character. UNIX keeps track of Tiles internally 
METHOD TO AUTOMATICALLY INCREASE THE by assigning each one a unique identifying number. 
SEGMENT SIZ E OF UNIX FILES IN A PAGE These numbers, called inode numbers, are used only 
SEGMENTED VIRTUAL MEMORY DATA within the UNIX operating system kernel itself. While 

PROCESSING SYSTEM 5 UNIX uses inode number to refer to files, it allows users 

to identify each file by a user-assigned name. A file 
TECHNICAL FIELD name can be any sequence containing from one to four- 

This invention relates in general to a methods for use teen characters, 
in a virtual memory data processing, and in particular, There are three types of files in the UNIX file system: 
to an improved method for automatically increasing the 10 0) ordinary files, which may be executable programs, 
number of virtual page addresses that have been as- text, or other types of data used an input or produced as 
signed to a segment of a page segmented virtual mem- output from some operation, (2) directory files, which 
ory type data processing system after the segment has contain lists of files, and (3) special files, which provide 
been initially created. a standard method of accessing I/O devices. 

CROSS-REFERENCED APPLICATION UNIX DIRECTORIES 

U.S. Application Scr. No. 06/819,458, now U.S. Pat. UNIX provides users a way of organizing files. Files 

No. 4,742,447, filed concurrently herewith in the name may be grouped into directories. Internally, a directory 

of Duvall et al, entitled "Method to Control I/O Ac- is a file which contains the names of ordinary files and 

cessES in a Multi-Tasking Virtual Memory Virtual 20 other directories, and their corresponding inode num- 

Machine Type Data Processing System" is directed to a bers. Given the name of a file, UNIX looks in the file's 

method for use in a multi-user paged segmented virtual directory and obtains the corresponding inode number 

memory data processing system in which a mapped file f or the file. With this inode number, UNIX can examine 

data structure is selectively created to permit all I/O other internal tables to determine where the file is 

operations to the secondary storage devices to be exe- 25 stored and to make it accessible to the user. UNIX 

cuted by simple load and store instructions under the directories themselves have names, each of which may 

control of the page fault handler. gjgo contain fourteen characters. 

DESCRIPTION OF THE PRIOR ART UNIX HIERARCHICAL FILE SYSTEM 

The prior art discloses a number of data processing 30 Just ^ directories provide a means for users to group 

systems which are capable of running a UNIX* type mes , unix supports the grouping of directories into a 

operating system. U.S. Pat. Nos. 4,536,837; 4,470,115; hierarchical file system. At the very top of a hierarchy 

4,104,718 and 4,047,244 are representative of the patents ^ a directory. It may contain the names of individual 

2& t ™wk$*Fr tr* data P roccssin « •J** 1 * 35 Mies and the names of other directories. These, in turn, 

In addition, there are a number of publications and ™ v *f names of individual files and still other 

manuals which describe, at various levels, the architec- directories, and so oil A hierarchy of files is the result, 

ture and operation of the UNIX operating system and H» ^ x m * hierarchy resembles an upside-down 

the various versions, releases, and look-alike derivatives trec * its root at tnc t0 P- Tllc various directories 

of the basic UNIX system. The following are a repre- 40 branch ° ut ^ ^ trace a P ath t0 the >*divid- 

sentative sample of such publications. ^ files* which correspond to the tree's leaves. The 

1. "A Tour Through the UNIX File System," James ^ fi* e system is described as "tree-structured," with 
Joyce, October 1983, pp 170-182, Byte Publica- single directory. All the files that can be reached by 
tions. Inc. tracing a path down through the directory hierarchy 

2. "UNIX as an Application Environment," Mark 45 from the root directory constitute the file system. 

Krieger, et al, October 1983, pp 209-214, Byte UNIX FILE SYSTEM ORGANIZATION 
Publications, Inc. 

3. 4 The UNIX System Calls," Brian W. Kernigham, UNIX maintains a great deal of information about the 
et al, 1984, pp 203-231, The Unix Programming files that it manages. For each file, the file system keeps 
Environment. 50 track of the file's size, location, ownership, security, 

4. "UNIX Tune-Sharing: A Retrospective," p. M. type, creation time, modification time, and access time. 
Ritchie, January 1977, pp 1947-1969, The Bell All of this information is maintained automatically by 
System Technical Journal, July-August 1978. the file system as the files are created and used. UNIX 

5. 37 UNIX Variant Opens a Path to Managing Multi- file systems reside on mass storage devices such as disk 

processor Systems," Paul Jackson, July 1983, pp 55 files. These disk files may use fixed or removable type 

1 18-124, Electronics. media which may be rigid or flexible. UNIX organizes 

6. "UNIX- Berkeley 4,2 Gives UNIX Operating Sys- a disk as a sequence of blocks, which compose the file 
tern Network Support,** Bill Joy, July 1983, pp system. These blocks are usually either 512 or 2048 
114-118, Electronics. bytes long. The contents of a file are stored in one or 

7. 'The UNIX Tutorial, Part I," David Fiedler, Au- 60 more blocks, which may be widely scattered on the 
gust 1983, pp 186-219, Byte Publications, Inc. disk. 

8. 'The UNIX Tutorial, Part 2," David Fiedler, Sep- An ordinary file is addressed through the inode struc- 
tember 1983, pp 257-278, Byte Publications, Inc. ture. Each inode is addressed by an index contained in 

UNIX FILES 4,1 *"' ist ' T * 1C *" 1 * st ls 8 cncratc d Da $ed on the size of the 

" 65 file system, with larger file systems generally implying 

The fundamental structure that the UNIX operating more files, and thus larger i-lists. Each inode contains 

system uses to store information is the file. A file is a thirteen 4-byte disk address elements. The direct inode 

sequence of bytes, typically 8 bits long, and is equiva- can contain up to ten block addresses. If the file is larger 
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than this, then the eleventh address points to the first 
level indirect block. Address 12 and address 13 are used 
for second level and third level indirect blocks, respec- 
tively, with the indirect addressing chain before the first 
data block growing by one level as each new address 5 
slot in the direct inode is required. 

UNIX FILE SYSTEM ACCESS VIA 
READ/WRITE SYSTEM CALLS 

All input and output is done by reading the writing 10 
files, because all peripheral devices, even terminals, are 
files in the file system. In a most general case, before 
reading and writing a file, it is necessary to inform the 
system of your intent to do so, by way of opening the 
file. In order to write to a file, it may also be necessary 15 
to create it. When a file is opened or created (by way of 
the 'open 1 or 'create* system calls), the system checks 
for the right to do so, and if all is well, returns a non- 
negative integer called a file descriptor. Whenever I/O 
is to be done on this file, the file descriptor is used in- 20 
stead of the name to identify the file. This open file 
descriptor has associated with it a file table entry kept in 
the "process" space of the user who has opened the file. 
In UNIX terminology, the term "process" is used inter- 
changeably with a program that is being executed. The 23 
file table entry contains information about an open file, 
including an inode pointer for the file, and the file 
pointer for the file, which defines the current position to 
be read or written in the file. All information about an 
open file is maintained by the system. 30 

In conventional UNIX, all input and output is done 
by two system calls, 'read' and 'write/ which are ac- 
cessed from programs having functions of the same 
name. For both system calls, the first argument is a file 
descriptor. The second argument is a pointer to a buffer 35 
that serves as the data source or destination. The third 
argument is the number of bytes to be transferred. Each 
Yead* or •write* system call counts the number of bytes 
transferred. On reading, the number of bytes returned 
may be less than the number requested, because fewer 40 
than the number requested remained to be read. A re- 
turn value of zero implies end of file, a return value or 
- 1 indicates an error of some sort For writing, the 
value returned is the number of bytes actually written. 
An error has occurred if this isn't equal to the number 45 
supposed to be written. 

The "read* and 'write' system calls' parameters may 
be manipulated by the application program which is 
accessing the file. The application must therefore be 
sensitive to and take advantage of the multi-level store 50 
characteristics inherent in a standard system memory 
hierarchy. It is advantageous, from the application per- 
spective, if the system memory components can be 
viewed as a single level hierarchy. If this were properly 
done, the application could dispense with most of the 55 
I/O overhead. 

The prior art also discloses a number of multi-tasking 
virtual memory data processing systems in which the 
system architecture is based on establishing a different 
"virtual machine" or terminal for each of the applica- 60 
tions that are run concurrently on the system. In such 
systems, the operating system executes in a virtual ma- 
chine which is established by a Virtual Resource Man- 
ager. The Virtual Resource Manager (VRM) is a group 
of programs or processes that extend the system's pro- 65 
cesser or microprocessor and the system's memory 
management unit, to provide a high level port for the 
operating system in a virtual machine environment. 



A software interface between the program and the 
operating system and the programs of the Virtual Re- 
source Manager is established, and referred to as the 
Virtual Machine Interface (VMI). A virtual machine, 
therefore, has a very high-level physical machine-like 
interface. 

In most prior art systems which operate in a multi- 
tasking virtual machine environment, the Virtual Re- 
source Manager provides the virtual machine with vir- 
tual memory that is transferred to the virtual machine. 
Various arrangements for managing the address space 
of the virtual memory are used by these prior art virtual 
memory systems. In one well-known technique, re- 
ferred to as "Paged Segmentation," the entire address 
range of the virtual memory is divided into equal-sized 
segments. The virtual address, therefore, comprises two 
portions; a segment ID and an offset. For example, the 
virtual address space comprises 2*(4.0), 2 to the expo- 
nent 40 address locations, a virtual address consisting of 
40 bits is required. If a segment identifier of 12 bits and 
an offset of 28 bits is used for the format of the virtual 
address, then 2*(12) or 4,096 separate segments are 
provided, with each segment having 2* (2 8) or 
256 X 10*(6) separate address locations. If it is assumed 
that each address location can store one page of data, 
and one page of data holds 2048 (2K) bytes, then the 
capacity of the virtual memory is 1 terabyte (2 (43)). 

These prior art systems also employ different ar- 
rangements for generating the virtual address, depend- 
ing on the architecture of the system processor. One 
technique employed by processors which have an effec- 
tive real memory address of 32 bits is to employ a prede- 
termined number "n" of the high order address bits to 
select one out of two to the 2 (n) segment ID registers, 
each of which is capable of storing a segment ID having 
the required length. In the previous example of the 40 
bit virtual address, the segment register would have 12 
stages for storing a 12 bit segment ID, which is concate- 
nated with the remaining 28 bits of the processor's ef- 
fective real address, which provides the offset portion 
of the 40 bit virtual address. 

A virtual machine that is created by the VRM gener- 
ally will define a number of memory segments with 
which it will be working at any one time. To access data 
in one of the segment, the virtual machine loads a seg- 
ment identifier into one of the 16 segment registers, 
using the previous example of the addressing technique. 
Segments that arc selected by the virtual machine are 
usually private, unless the virtual machine grants access 
to other virtual machines. Access to segments can be 
controlled by the operating system of the virtual ma- 
chine. 

A virtual memory system generally employs a page 
faulting mechanism which functions to control the pag- 
ing of data between the system memory and the disk 
files. These storage devices are often referred to as 
primary and secondary storage, or front and back store 
devices. The paging function is, to some extent, similar 
to I/O operations run by the application program. So 
much so, that in some simple paging systems, a conflict 
arises between file I/O operations which are under 
control of the application program and the operating 
system, and paging operations. For example, a file de- 
vice driver may read disc data into a memory buffer, 
then the paging system, acting independently, may 
write the newly buffered data back out to the disk. 
When there is no coordination of effort between the file 
I/O subsystem and the paging I/O subsystem, potential 
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duplication exists with program loading, in that the disk storage device for a given file. The map is stored in 
loader will read a program from the library section of real memory, in space assigned to the virtual machjne. 
the back store to the front store, while the paging I/O Once this mapping is achieved, a program running in 
function will return the program to a different disk a virtual machine can execute machine level data access 
address during a page out operation. 5 operations without regard to the physical location of 

Is is, therefore, important that the data processing the data at the time of access. If the data is not in active 
system reflect the degree of coordination between the memory at the time of reference, then a page fault is 
two similar functions, and various arrangements have induced. The underlying paging system resolves the 
been disclosed in the prior art for achieving this coordi- page fault by referencing the address location of the 
nation. However, the required coordination does have «> ^ and if the data is actually allocated on a secondary 
an impact on system performance, and prior art coordi- storage device, then this address location wilt corrc- 
nation techniques become quite unmanageable when an s I*>nd *> a physical location on the secondary device 
attempt is made to implement them in a multi-tasking, which » defined by the page mapping for that segment 
multi-user virtual memory UNIX-type environment address. Each virtual address in the segment range has, 
employing a very large virtual memory. 15 at most > onc Physical data location on the secondary 

In accordance with the method of the cross- device Conversely, each physical data location on the 
referenced application, a virtual machine environment secondary device may be referenced by 2,048 separate 
is established in which all file I/O operations can be segment addresses. The logical and physical extent of 
assigned to the page faulting mechanism of the memory _ th 5 rc ^ onsh ? be w twecn f P^ge and a block of data is 
manager unit which is part of the Virtual Resource 20 ^ defmes a *> ve relationship. The address speci- 
Manager that establisheVthe virtual machine. The J**?* thcn ***** * ^^f 1 ^ 

UNIX read and write system calls to UNIX-structured * ut n™** * own to / he first 2K b°™tary for 

files is maintained, as are the conventional data struc- the actual secondary device ocation 
tares employed by the page faulting mechanism. Struc- „ ™ c enhancement to the kernel of the operating 

tares such as the ExternaJ Page Table, for recording 25 %*T ^f**"* 6 *?**™* e SUpP ° rt m 

, . * . . . the form of mapped executables. When a program is 

correspondence between addresses m virtual memory , . . v*~J*i~«~« % ^ a^iTm^v. 

• * . . » .* i jj i rj / loaded, the kernel maps the programs disk blocks to 

and the real address lotion of data ^ nci viftual mcmo ^ J£ data ^mcnis. In 

on the disk file are mamtamed as u the Inverted Page WIX terminologV) ^xT k ^ part of th * program 

Table which correlates virtual addresses of pages in ZQ ^ ^ ^ whc 5 rcas variaWc such „ tables and 

system memory with system memory real addresses m referred t0 M Th e kerT iel performs 

In the system described in the cross referenced appli- UtUe h ical yQ tQ |oftd ^ Qn , the 

cation, the Virtual Memory Manager allows the data flle hea d is read by the kernel. All remaining 

contained within a segment to be associated with files in disk yQ ^ demand pagcd ^ the program * executed, 

the virtual machine's file syste^ thus allowing that data 35 results m a signified performance increase for 

to exist after the execution of a program. This associa- ^ program5 which> without Map Page Range SU p- 

tion of file data to virtual pages is achieved through, ^ would ^ to ^ read ^to memory and 

what is referred to as, mapped files or mapping of files. possibly paged out by the paging supervisor. 

The Map Page Range service that is established is This map file support consists of a system call inter- 
provided to allow a virtual machine the ability to create 40 fo^ t0 the Map Page facilities. The prior art 
a one-level store environment for a subset, such as the ijnix system call "SHMAT" has been modified to 
mapping of an individual file. It should be noted that include a flag bit which may be specified in the 
generally most operating systems, such as the UNIX SHMAT system call in accordance with the present 
operating system, provide the ability for an application method, When the SHM-MAP flag is specified, the 
program to access disk files through the conventional 45 ^ fa c associated with the specified open file descrip- 
I/O system call. On the other hand, application pro- tor is mapped to the address space of the calling pro- 
grams generally do not have the ability independent of cess. When the file has been successfully mapped, the 
an operating system, to access secondary storage files, segment start address of the map file is returned. The 
such as disk drives. Application programs, however, are data file to be mapped must be a regular file residing on 
designed to operate intimately with the microprocessor 50 the secondary storage device. Optional flags may be 
to address system memory by so-called load and store supplied with the "SHMAT" system call to specify 
instructions. how the file is to be mapped. The different way in 

The Map Page Range service allows a virtual ma- which the files are to be mapped correspond generally 

chine to create a "one-level store" environment This to those available in the basic UNIX system, namely 

service is necessary, because neither the operating sys- 55 read-only, write-only, copy-on-write (SHM-COPY). 

tern executing in the virtual machine, nor the Virtual All processes that map the same file, read-only or 

Resource Manager have the capability of themselves to read-write, map to the same virtual memory segment, 

map a file. The virtual machine does not have access to This segment remains mapped until the last process 

the Virtual Memory Manager's table and the Virtual mapping the file closes it. All processes that map the 

Resource Manager is designed to be independent of the 60 same file copy-on-write, map to the same copy-on-write 

virtual machine's file system structure. The Map Page segment. Changes to the copy-on-write segment do not 

Range service provides the virtual machine the ability affect the contents of the file resident in the file system 

to tell the VMM the relationship between a logical until specifically requested to do so by the user, by 

entity, such as a file, and its location on the disk. issuing a special command referred to as "fsync." If a 

The method selectively maps the disk blocks of a file 65 process requests copy-on-write mapping for a file and 

to a different memory segment. The mapping process the copy-on-write segment does not yet exist, then it is 

dynamically maps a range of blocks (one block contains created, and that segment is maintained for sharing until 

one page equal to 2K), that have been allocated on the the last process attached to it, detaches it with a close 
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system call, at which time the segment is destroyed. The 
next request for a copy-on-write mapping for the same 
file causes a new segment to be created for the file. 

A file descriptor can be used to map the correspond- 
ing file only once. A file may be multiply mapped by 5 
using multiple file descriptors (resulting from multiple 
"open" system calls), however, a file cannot be mapped 
both read-write and copy-on- write by one or more users 
at the same time. 

A general system flow for a mapped file reference is 10 
described for the following scenario. In mis scenario! 
the application attempts to reference a data area in a file 
which is not currently in memory. This reference causes 
a memory fault, and the process which is running the 
application is placed in a wait stage. The Virtual Re- 15 
source Manager allocates a page in memory for the new 
data. It then determines what physical address the data 
resides at on disk, from the file map created earlier for 
the file by the map file services function. A start I/O 
operation is initiated to disk, the disk adapter primes the 20 
memory location with the 2K byte data block from the 
file, and an interrupt is issued to the virtual machine, i.e., 
the UNIX kernel, which does a context switch to per- 
mit the operating system to take control. The process is 
made dispatchable, and the operating system kernel 23 
then returns control to the Virtual Resource Manager, 
which then re-dispatches the process. When a file is 
mapped onto a segment, the file may be referenced 
directly by accessing the segment by load and store 
instructions, as previously indicated. The virtual mem- 30 
ory paging system automatically takes care of the physi- 
cal I/O. However, references beyond the end of the file 
cause a problem as do references to so-called "holes" in 
the file which exist because the segment is a sparse-type 
file or a portion of the file has been intentionally de- 35 
leted. 

Since the map that was created by the Map Page 
Range service reflects actual storage locations on the 
secondary storage device that have been assigned on 
the basis of the size of the object that is to be stored in 40 
the segment; only the approximate number of virtual 
addresses that are needed to store the object are actually 
assigned. In creating the segment, virtual addresses are 
assigned in blocks which, in the preferred embodiment, 
were 64K blocks. If the file expands beyond the 64K 45 
border, a problem is created since there is an unmapped 
area of virtual memory. A similar situation arises when 
the virtual address range assigned to store an object 
does not include contiguous virtual addresses. The so- 
called "hole** in the virtual address range is not mapped 50 
since no physical space on the secondary storage device 
has been allocated. If an application addresses a virtual 
address that lies in the hole, the page fault cannot be 
serviced. Also, addresses beyond the current end of the 
file, but within the 64K block boundary, must also be 55 
taken care of since pages at these addresses require a 
change in status to be recorded. 

In accordance with the present invention, an im- 
proved method for overcoming these problems is pro- 
vided. 60 

SUMMARY OF INVENTION 

In accordance with the method of the present inven- 
tion, an attempt by an application to address a virtual 
page after a file has been opened and the file mapped to 65 
another segment is treated as a protection exception. In 
one situation, the virtual page address of the desired 
page is greater than the current segment size, but still 
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within the segment boundary, since address space is 
allocated in 64K blocks. In creating a segment, the size, 
in terms of the number of pages or page addresses, is 
determined by the actual number of page addresses 
required. The remaining pages up to the next 64K 
boundary are protected by a flag from being written 
into. 

In the other situation, the requested page address is 
beyond the 64K boundary. 

Both situations are handled as a protection violation. 
In the first situation, where the address is not beyond 
the 64K boundary of the segment, pages are allocated 
by a supervisory call of the UNIX kernel and one or 
more disk blocks are allocated from a list of available 
blocks. The Map Page Range service then extends the 
range of the mapped segment. In the second situation 
involving an address beyond the current boundary but 
below the maximal permissible segment size, the size of 
the mapped segment is increased in 64K blocks, up to 
the needed size. Pages within the last 64K block which 
are not used for the file are again protected. 

It is therefore an object of the present invention to 
provide an improved method for use in a page seg- 
mented virtual memory data processing system for in- 
creasing the size of a segment automatically, in response 
to a request for a virtual page whose address is greater 
than the segment size. 

Another object of the present invention is to provide 
an improved method for use by a page segmented vir- 
tual memory system to automatically increase the num- 
ber of virtual pages that were initially assigned in the 
segment and the number of corresponding storage 
blocks on a secondary storage device that were initially 
assigned in response to a request by an application pro- 
cess for access to a virtual page address that is beyond 
the address range as initially established. 

Objects and advantages other than those mentioned 
above will become apparent from the following de- 
scription, when read in connection with the drawing. 

BRIEF DESCRIPTION OF THE DRAWING 

FIG. 1 is a schematic illustration of a virtual memory 
system in which the method of the present invention 
may be advantageously employed. 

FIG. 2 illustrates the interrelationship of the Virtual 
Resource Manager shown in FIG. 1 to the data process- 
ing system and a virtual machine. 

FIG. 3 illustrates the virtual storage model for the 
system shown in FIG. 1. 

FIG. 4 illustrates conceptually, the address transla- 
tion function of the system shown in FIG. 1. 

FIG. 5 illustrates the interrelationships of some of the 
data structures employed in the system of FIG. 1. 

FIG. 6 illustrates the interrelationship of a number of 
data structures to the Virtual Resource Manager, the 
virtual memory, and real memory. 

FIGS, la through 7c are flow charts, illustrating the 
various steps involved in mapping and accessing a file in 
response to various UNIX System Calls in accordance 
with the present invention. 

FIG. 8 is a flowchart, illustrating the steps involved 
in increasing the number of virtual pages in a segment of 
memory after it has been created. 

DESCRIPTION OF THE PREFERRED 
EMBODIMENT 

System Overview: FIG. 1 is a schematic illustration 
of a virtual memory system in which the method of the 
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present invention is employed. As shown in FIG. 1., the 
system comprises a hardware section 10 and a software 
or programming section 11. Hardware section 10, as 
shown, comprises a processor function 12, a memory 
management function 13, a system memory function or 5 
RAM 14, system bus 15, an Input/Output Channel Con- 
troller (IOCC) 16, and an Input/Output bus 21. The 
hardware section further includes a group of I/O de- 
vices attached to the I/O bus 21 through the IOCC 16, 
including a disk storage function 17, a display function 10 
18, a co-processor function 19, and block 20, represent- 
ing other I/O devices such as a keyboard or mouse-type 
device. 

The program section of the system includes the appli- 
cation program 22 that is to be run on the system, a 15 
group of application development programs 23, or tools 
to assist in developing new applications, an operating 
system kernel 24, which, for example, may be an exten- 
sion of the UNIX system V kernel, and a Virtual Re- 
source Manager program 25, which functions to permit 20 
a number of virtual machines to be created, each of 
which is running a different operating system, but shar- 
ing the system resources. The system may operate, 
therefore, in a multi-tasking, multi-user environment 
which is one of the main reasons for requiring a large 25 
virtual memory type storage system. 

FIG. 2 illustrates the relationship of the Virtual Re- 
source Manager 25 to the other components of the 
system. As shown in FIG. 2, a virtual machine includes 
one or more application programs such as 22a-22c and 30 
at least one operating system 30. A virtual machine 
interface 31 is established between the virtual machine 
and the VRM 25. A hardware interface 32 is also estab- 
lished between the VRM 25 and the hardware section 
10. The VRM 25 supports virtual memory. It can be 35 
assumed, for purposes of explanation, that the memory 
capabilities of the hardware shown in FIG. 1 includes a 
24 bit address space for system memory 14, which 
equates to a capacity of 16 megabytes for memory 14, 
and a 40 bit address space for virtual memory, which 40 
equates to 1 terabyte of memory. A paged segmentation 
technique is implemented for the Memory Management 
Unit 13, so that the total virtual address space is divided 
into 4,096 memory segments, with each memory seg- 
ment occupying 256 megabytes. 45 

FIG. 3 illustrates the virtual storage model. The pro- 
cessor 12 provides a 32 bit effective address which is 
specified, for example, by the application program. The 
high order 4 bits of the 32 bit address functions to select 
I of 16 segment registers which are located in the Mem- 50 
ory Management Unit (MMU) 13. Each segment regis- 
ter contains a 12 bit segment ID section, along with 
other special control-type bits. The 12 bit segment ID is 
concatenated with the remaining 28 bits of the initial 
effective address to provide the 40 bit virtual address 55 
for the system. The 40 bit virtual address is subsequently 
translated to a 24 bit real address, which is used to ad- 
dress the system memory 14. 

The MMU 13 utilizes a Translation Look-aside 
Buffer (TLB) to contain translations of the most re- 60 
cently used virtual addresses. Hardware is used to auto- 
matically update TLB entries from main storage page 
tables as new virtual addresses are presented to the 
TLBs for translation. 

FIG. 4 illustrates conceptually, the TLB reload func- 65 
tion. 

The 40 bit virtual addresses are loaded into the TLB 
by looking them up in an Inverted Page Table (IPT), as 



shown in FIG. 4. The table is "inverted" because it 
contains one entry for each real memory page, rather 
than one per virtual page. Thus, a fixed portion of real 
memory is required for the IPT, regardless of the num- 
ber of processes or virtual segments supported. To 
translate an address, a hashing function is applied to the 
virtual page number (high order part of the 40 bit vir : 
tual address, less the page offset) to obtain an index to 
the Hash Anchor Table (HAT). Each HAT entry 
points to a chain of IPT entries with the same hash 
value. A linear search of the hash chain yields the IPT 
entry and, thus, the real page number which corre- 
sponds to the original 40 bit virtual address. If no such 
entry is found, then the virtual page has not been 
mapped into the system, and a page fault interrupt is 
taken. 

The function of the Page Fault Handler (PFH) is to 
assign real memory to the referenced virtual page and 
to perform the necessary I/O to transfer the requested 
data into the real memory. The system is, thus, a de- 
mand paging type system. 

When real memory becomes full, the PFH is also 
responsible for selecting which page of data is paged 
out. The selection is done by a suitable algorithm such 
as a clock page replacement algorithm, where pages are 
replaced based on when the page was last used or refer- 
enced. Pages are transferred out to disk storage. 

Virtual Memory Manager Data Structures: The char- 
acteristics of the Virtual Memory Manager data struc- 
tures will now be described. 

Segment Table: The Segment Table (S IDT ABLE) 
contains information describing the active segments. 
The table has the following characteristics. The table is 
pinned in real memory and its size is predetermined. It 
must be word-aligned in memory, and the segment table 
must be altered in a paging subsystem critical section. 

Externa] Page Table: The External Page Table 
(XPT) describes how a page is mapped to the disk. 
There is one XPT entry for each defined page of virtual 
memory. The XPT entries for a segment are allocated 
as continuous entries when the segment is created. The 
XPT entries for a small segment, that is one that is less 
than 1 megabyte, do not cross an XPT page boundary. 
The XPTs for a large segment, those larger than 1 
megabyte, are aligned, at the start of an XPT page. The 
XPT entries are allocated in units of 32 entries which 
will map 65,536 bytes (64K) of virtual memory. Each 
entry requires 4 bytes. The table has the following char- 
acteristics. Only XPT root entries must be pinned into 
memory. Its size is predetermined, and it must be word- 
aligned. The virtual page number is the index into the 
XPT table. The XPT must be altered only in a Paging 
Subsystem critical section. 

Inverter Page Table: The Inverted Page Table (IPT) 
describes the relationship between virtual addresses and 
real addresses, as discussed previously. The IPT con- 
sists of two arrays. The first one is primarily defined by 
the memory management unit, and contains the infor- 
mation that controls the translation function. The sec- 
ond array contains the Paging Subsystem page state 
information, used to control page fault processing. This 
array has the following characteristics. It is pinned, and 
its size is determined by the real memory size which is 
set at the Initial Program Load time (IPL). It is aligned 
according to real memory size. The real page number is 
the index into the IPT. Like the previous structures, it 
must be altered in a Paging Subsystem critical section. 
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Each real page frame has an entry in the IPT. AD pages 
are on one of three lists. 

There is one main list for each valid segment. It is 
doubly linked and anchored in the segment control 
block. This list links together all of the page frames 5 
assigned to the segment with a valid virtual address, and 
for which there may be a valid Translation Look-aside 
Buffer (TLB) entry. 

There is one system-wide free list that links together 
the page frames that may be reassigned. This doubly 10 
linked, circular list is anchored in the IPT entry for 
page one. Pages on this list do not have a valid TLB 
entry, and accesses to them will always result in a page 
fault Pages may be on both the main list and free list. 
This is done so that the pages may be released without 15 
searching the free list. Unnamed (unhashed) pages are 
put at the head of the list, and named (hashed) pages are 
put at the tail. 

There is one system-wide I/O list that links together 
all of the pages currently being read or written to the 20 
disk. This doubly linked, circular list is anchored in the 
IPT entry for page two. Pages on this list do not have a 
valid TLB entry, and accesses to them will also result in 
a page fault. There must be only one page I/O list to 
ensure that I/O is processed first-in, first-out by block, 25 
even if non-first-in, first-out disk scheduling is per- 
formed. 

Notification Control Block; A Notification Control 
Block (NCB) contains the information required to no- 
tify a virtual machine of the completion of an asynchro- 30 
nous paging request The asynchronous request can be 
either a purge page range Service Call (SVC), or a page 
fault when asynchronous acknowledgement is allowed. 
An NCB is a self-describing control block in the system 
control block area. Its identifier field can be used to 35 
differentiate it from other types of control blocks in the 
system control block area. This is required since NCBs 
are queued on the same list as Process Control Blocks 
(PCBs). An NCB is pinned and allocated in the system 
control block area when needed. Like the previous 40 
structures, it must be altered in a Paging Subsystem 
critical section. An NCB is only allocated when the 
Page Fault Handler is performing a function on behalf 
of a process and, therefore, will not cause the system to 
abnormally terminate due to insufficient system control 45 
blocks. 

Page Fault Wait Lists: The Virtual Memory Manager 
can place a process either internal or virtual machine on 
one of three circular wait lists. 

There is one page I/O wait list for each frame in the 50 
system. A page's I/O wait list is anchored in the page's 
IPT entry and links together the Process Control 
Blocks (PCBs) of the processes synchronously waiting 
for I/O to complete to the page, and the NCBs of the 
processes asynchronously waiting for I/O completion 55 
notification. A process is placed in a page's I/O wait list 
when it reclaims the page with I/O in progress or it 
initiates a page in I/O as a result of a page fault. 

There is one global system free page frame wait list. 
It links together the PCBs or NCBs for the processes 60 
that are waiting for a free page frame. This list is pro- 
cessed first-in, first-out. A process is placed on this list 
when it requires a free page frame and there is not one 
available. The processes' PCB is enqueued on the list 
for synchronous waits and an NCB is enqueued on the 65 
list for asynchronous waits. Lastly, there is one global 
system page I/O wait list. It links together the PCBs or 
NCBs for the processes that are waiting for all page out 
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I/O less than or equal to a specific page I/O level. This 
list is sorted by a page I/O level. A process is placed on 
this list by several of the Virtual Memory Manager 
service calls to ensure that the contents of the disk 
match the contents in memory. A PCB is enqueued on 
the list for synchronous requests or an NCB is enqueued 
on the list of asynchronous requests. Note that with 
non-first-in, first-out disk scheduling, the page I/O level 
may result in the process waiting longer than is re- 
quired. 

Paging Mini-Disk Table: The paging mini-disk table 
controls the translation of Virtual Memory Manager 
slot numbers into the mini-disk I/O Device Number 
(IODN) and logical block number. The number of 
entries In this table define the maximum number. The 
number of entries in this table define the maximum 
number of mini-disks that the Virtual Memory Manager 
can perform paging operations to. This array has the 
following characteristics. It is pinned, its size is prede- 
termined, and it is word-aligned. The paging space 
mini-disk entries are allocated at system initialization 
and must be the first en try /entries in the table. Mapped 
page range service calls allocate an entry for mapped 
mini-disks. The most significant bits of the disk address 
are the index into this table. As in the previous data 
structures, it must only be altered in a Virtual Memory 
Manager critical section. 

Disk Allocation Bit Map: The Virtual Memory Man- 
ager maintains a bit map for each paging space mini- 
disk. Each bit indicates if its page is allocated or free. 
Bad slots are marked as allocated when the mini-disk is 
opened at system initialization. This array has the fol- 
lowing characteristics. It is not pageable, the paging 
space is allocated at page out time, the least significant 
bits of the disk address are the index into this array, and 
as with the previous structures, it must be altered only 
in a Virtual Memory Manager critical section. 

Paging Device Extensions: One Paging Device Ex- 
tension (PDX) exists for each paging space that the 
Virtual Memory Manager supports. A PDX is an exten- 
sion for a paging space entry in the paging mini-disk 
table. The Virtual Memory Manager manages paging 
space and the PDX is what is used to guide it in that 
management The attributes of the PDX are; it is pinned 
and it is allocated from the system control block area at 
system initialization. It is linked together in a list and 
anchored by a global pointer, and as previous struc- 
tures, it must be altered only in a Virtual Memory Man- 
ager critical section. PDXs are not dynamically allo- 
cated. System initialization allocates all PDXs and ini- 
tializes them. 

Page Fault Processing: Synchronous page fault pro- 
cessing is the traditional type of page fault processing. 
In this operation, the faulting process is forced to wait, 
until the I/O required to resolve the page fault is com- 
plete. The Virtual Memory Manager restarts the pro- 
cess at the completion of each I/O request. When redis- 
patched, the process will either page fault, in which 
case additional I/O will be scheduled to resolve the 
fault, or will not page virtual machine receives a "page 
fault cleared" machine communication interrupt so that 
it can put its faulting task back on its ready task list. This 
allows page faults to be processed asynchronously with 
respect to the execution on the virtual machine. The 
virtual machine can force synchronous page fault pro- 
cessing by disabling page fault notification. It should be 
noted that page fault cleared interrupts cannot be di- 
rectly disabled by a virtual machine. A page fault 
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cleared interrupt is always given when the I/O is com- 
plete for a fault that has resulted in a page fault occurred 
interrupt. Page fault cleared interrupts can be indirectly 
disabled by disabling page fault occurred interrupts. 

Synchronous Page Fault Processing: For synchro- 
nous faults, the Process Control Block (PCB) of the 
process that faulted is placed on either the page's I/O 
wait list or the free page frame list when the I/O is 
required. The process is placed on the page I/O wait list 
when the Virtual Memory Manager initiates I/O for the 
page or I/O for the page was already in progress. The 
process is placed on the free page frame list when there 
are no free page frames available to perform the I/O 
into. 

Asynchronous Page Fault Processing: When an asyn- 
chronous page fault occurs, the faulting virtual machine 
is notified of the segment identifier it faulted on, and the 
virtual address rounded down to the nearest page 
boundary. It is important to note that notification is not 
given for the address that the virtual machine faulted 20 
on, but for that page. For example, if a virtual machine 
faults on addresses x'806\ x*856', x'87E\ it will get three 
page fault occurred notifications for x'800* and one page 
fault cleared notification for x*800\ A Notify Control 
Block (NCB) is allocated and chained to the page's I/O 25 
wait list when I/O is required. This is the same chain 
that PCBs are chained onto. The PCBs and NCBs are 
typed so it is possible to tell them apart. A PCB is 
chained for a synchronous fault and an NCB is chained 
for an asynchronous fault 

If the notification was given because of a page fault 
on the External Page Table (other than the original 
fault), then the Notification Control Block is chained off 
the IPT that the XPT is paged into, but the address of 
the original fault is in the Notification Control Block. 

The free frame wait list case is a special case. The 
virtual machine is notified and its Notification Control 
Block is chained, first-in, first-out, onto the free frame 
wait list along with PCBs. The first page out that causes 
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memory, there are no free frames in memory to page it 
into, and the virtual memory manager faults on the XPT 
for the original page. The following lists the order of 
events (Note that this scenario is not the typical case): 

1. VM Page Faults 

2. VMM Enqueues Page out requests to build up free 
page frame list 

3. VMM Notifies virtual machine of Original Page 
Fault VM is Dispatched (presumably it will task 
switch or wait) 

5. Page out I/O completes 

6. VMM Notifies virtual machine that the original Page 
Fault is resolved 

7. VM is Dispatched 

8. VM Page Faults again on the same address 

9. VMM Page Faults on XPT 

10. VMM Enqueues Page in request for that XPT 

11. VMM Notifies virtual machine of Original Page 
Fault 

12. VM is Dispatched (presumably it will task switch or 
wait) 

13. The XPT Page in I/O completes 

14. VMM Notifies virtual machine that the original 
Page Fault is resolved 

15. VM is Dispatched 

16. VM Page Faults again on the same address 

17. VMM Enqueues Page in request for the page faulted 
on 

18. VMM Notifies virtual machine of the Page Fault 

19. VM is Dispatched (presumably it will task switch or 
wait) 

20. The Page in I/O completes 

21. VMM Notifies virtual machine that the original 
Page Fault is resolved 

22. VM is Dispatched 

Purge Page Range Notification: There is another way 
in the system to get a notification of I/O complete from 
the Virtual Memory Manager. This is on the asynchro- 
nous forced write option of the Purge Page SVC. One 



a frame to be freed-up when this NCB is at the head of 40 machine communications interrupt is presented to the 



the free frame wait list will cause notification to be 
given. 

Page Fault Occurred Notification: A page fault oc- 
curred interrupt is generated by the page fault handier 
upon determining that an asynchronous fault has oc- 45 
curred and I/O is required. No internal VRM queue 
element is required to perform this function. The page 
fault handler actually swaps the virtual machine's (PSB) 
and execution level. The premise that allows this is that 
page faults on machine communications or program 50 
check levels are processed synchronously, without noti- 
fication. This implies that the interrupt does not need to 
be queued because the virtual machine can always take 
page fault occurred interrupts. 



virtual machine upon completion of the I/O for the 
Purge. Like page fault complete interrupts, this is given 
to the virtual machine, regardless of whether the virtual 
machine enables page fault notification. 

The way it works is an NCB is chained on the page 
I/O level wait list, along with PCBs. In the NCB is 
marked the page I/O level that must be achieved before 
the purge I/O can be considered complete. When that 
page I/O level is attained, the virtual machine will be 
notified. 

Page Fault Handler: A large function of the page 
fault handler, namely the way it handles synchronous 
and asynchronous page faults is discussed in "Page 
Fault Processing.** In the fowllowing section, where 



Page Fault Cleared Notification: When the I/O for a 55 statement are made such as: 'the faulter is notified," this 
page fault is complete, the Virtual Memory Manager means that if the faulter faulted asynchronously, it is 
will be called to clean up. The page fault complete notified, otherwise it is un-readied, as per previously 
interrupt is queued to the virtual machine by the VRM described rules. This section describes the actual pro- 
queue management function. This implies the need for a cess that the page fault handler goes through to resolve 
queue element. The Notification Control Block is used 60 a fault. 



for that function. 

Asynchronous Page Fault Scenario: A page fault is 
considered complete when each I/O it generates com- 
pletes. A virtual machine will get *n' total page fault 
occurred interrupts, and V page fault complete inter- 65 
rupts for a page fault that requires V I/Os to satisfy. 
Example (n=3 here): Assume that the virtual machine 
faults asynchronously on a page that exists, but is not in 



The page fault handler runs as an extension of the 
program check handler, at a lower interrupt level; 
below all interrupting devices. It runs in a back-track 
state, thus allowing it to page fault on the Virtual Mem- 
ory Manager data structures. 

When the page fault handler is entered, it immedi- 
ately saves information about the fault, such as. the vir- 
tual address. The reason that it does this is, if it page 
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faults within itself, and that fault needs to do I/O, the ing the longest for a free frame. This processes is then 

page fault handler must know what address to give to notified/readied. 

the virtual machine for asynchronous notification. This Paging Space: The Virtual Memory Manager sup- 
implies that no page faults are allowed in the window ports paging to one or more paging spaces. Currently, 
between where the page fault handler has been backed 5 the only paging device supported is a hardftle, however, 
out because of a page fault and where it is called again the design has been made relatively flexible in this area 
to service its own fault. f° r future expansion. A requirement of all paging spaces 

There are several important steps into which the page k that they be formatted for 5 12 byte blocks, 
fault handler may be broken into: Pagiag Space Initialization: All paging spaces MUST 
Page Reclaim 10 ^ own t0 tnc Virtual Memory Manager at system 
If the page can be reclaimed, then the page fault initialization. If a user creates a paging space using the 
handler is done. If page in or page out I/O is in progress Mini-disk Manager, then, before the Virtual Memory 
to the page, then the faulter is chained onto the page's Manager will page to it, the system must be re-IPLed, 
I/O wait list. Upon completion of the I/O, a test is made ?' remitiabxed. The reason for this is that system initial- 
to see if any process is waiting on the frame and if so, it » »*» ■ the **** lhat the Vlrtual Memory Man- 
is notified. Reclaim, therefore, is split across the page a « cr space data structure are built. All pagmg 
fault handler and page fault end. If the page is on the ■» ™f M « the foc*uon bit map are set up 
free list, then the faulter is re-dispatched after the page * Mini-disk initialization time. The Min.-disk 
frame is made accessible. The faulter is not notified or ^ <T n «f mini-disks, and when it find a pag- 
forced to wait m * s P acc n 11111 "™*! lt calls a routine which effectively 
Building up the Free Page List 8 P^"* s P ace «° th \™*- *5 f ? r ° calIing 
If the fr« list is found to be below a lower threshold, the def »» P 8 *?* s P ac ? J?"""*' ,h * ^mi-disk Manager 

then page outs are initiated to build it up to an upper *f f 8 ^ mm ^ k < u wlU ta eft °P«?>- . The 

; , H 7, l~ , . , y . way that the define paging space routine works is as 

threshold. These thresholds are system tuning parame- 1$ fJ* 0 ££ P"**"* »p»« iuuunc wurw u> 

ters. If the free list is still empty after attempting to y Al j oca|c pDX fof the ^ e 

replenish it, then the faulter will be notified of the origi- 2 Initialize the PDX 

naT fault ..... 3. Initialize the paging mini-disk table. 

Clock wuh second chance is the technique used to 4 ^ ±c new pDX ontQ a Uflkcd M of ^ ^ 

select pages to be replaced. 3Q pDXs 

Processing the Fault The page fault handler involves 5 ^ p DX h made tQ ^ tQ jts m minidisk 

itself with most of the Virtual Memory Manager struc- ^ hle and vice versa 

tures, but most importantly, it examines the XPT for the 6> ^ the disk allocation bit map (temporary disk 

page faulted on, and the page fault handler may fault at for this space 

this time. It also allocates a paging space diskk slot for 35 k one disk allocation bit map , a„ d it * ^ 

the page. tioned among all paging spaces. The reason for having 

Page Fault End: This procedure handles all I/O com- one bit ^ than mu ltiple. is that by packing 
pletion interrupts for the Virtual Memory Manager. It is paging 8paccs ^ one bjt it win fc^^ thc 
scheduled for execution by the queue manager when locaUty of reference to the bit map. The XPTs for the 
the hard file device driver dequeues a Virtual Memory 40 bjt map m ^ such ^ the bit map ^ ^jaUy & log i- 
Manager request Note that execution of this routine is eaiiy ze ro , if a paging space is not a multiple of 64K, 
delayed until the completion of any preempted Virtual ±en initialization rounds the size up to the next 
Memory Manager critical section. Page fault cleared $4^ boundary, and marks the blocks (bits) in between 
notification is given by this procedure according to the ^ allocated. This requires the ability of system initial- 
rules set in "Page Fault Processing." This procedure 45 i^on to take a first reference page fault at this time, 
may not page fault and, therefore, no references are After defining a paging space, the Mini-disk Manager 
allowed to XPTs or other pageable data structures. then checks for bad blocks on the paging space. If a bad 
There are two types of I/O that can complete for the pag j ng spacc b |ock ^ found, the Mini-disk Manager will 
Virtual Memory Manager. call a routine to mark the bad paging spaces as allocated 

P a S e m 50 in the Virtual Memory Manager temporary disk map. 

Page out This way, the Virtual Memory Manager will never use 

All processes waiting on the frame are readied/noti- them. The Mini-disk Manager will then do bad block 

fied. Also, the page I/O level is updated. This is a count relocation on that paging space in the future, 

of all the I/O operations that have completed. All pro- Paging Space Management: Paging disk blocks are 

cesses waiting on a page I/O level less than or equal to 55 allocated one at a time, in a circular fashion per paging 

the updated page I/O level are readied-notified when space. A pointer is kept to the last place allocated at in 

the oldest I/O operation completes. The frame is made each paging space. On the next allocation in that partic- 

accessible by validating the IPT tag word for all page in ular paging space, the search for an empty slot starts at 

completions and reclaimed page out completions of an the last allocated slot and incrementally runs through 

unreleased page. Otherwise, the frame is placed on the 60 the paging space (wrapping around at end). The idea 

free list. behind allocating in this fashion is to improve page out 

This procedure attempts to replenish the system con- affinity, and page ahead. The circular pointer through a 
trol block area when the number of free system control paging space can be thought of as pointing to the "old- 
blocks is below its upper threshold and a free frame est" spot on that paging space, or, in other words, the 
exists. All processes waiting on a free system control 65 spot that was written out the longest ago. It is a reason- 
block are then readied. This procedure is also responsi- ably good probability that that disk slot will be free now 
ble for waking up processes waiting for a free frame. A (since it was allocated a long time ago). AH disk slots are 
free frame is assigned to the process that has been wait- allocated at page out time, so if a large purge page range 
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is performed, causing a lot of slots to be allocated at 
once, this will allocate them close together. This is 
assuming that the purge is being done to page out a 
working set of a particular process, or entity in the 
virtual machine. When that process becomes active 
again, its working set is close together on disk, minimiz- 
ing arm movement, and maximizing page ahead effi- 
ciency. 

In the presence of more than one paging space, they 
each, individually, behave as previously described. The 
Virtual Memory Manager disk allocation will decide 
which paging mini-disk to allocate a block to. The disk 
scheduler will keep track of where the disk arm is (ap- 
proximately). The Virtual Memory Manager utilizes 
this by attempting to allocate on the paging space 
whose point of last allocation is closest to where the 
disk arm is (for all disks). 

Virtual Memory Manager SVCs: The Virtual Mem- 
ory Manager SVCs all execute as extensions of the 
virtual machine. These SVCs can result in explicit I/O 20 
such as a page out of a purged page or implicit I/O such 
as page faults on code, stack, or data. All I/O for syn- 
chronous SVCs will place the virtual machine in a syn- 
chronous page fault wait state until the I/O is complete. 
Only implicit I/O for asynchronous SVCs will place 25 
the virtual machine in a synchronous page fault wait 
state until the I/O is complete. Explicit I/O will be 
initiated and the virtual machine notified upon comple- 
tion. 

Special Program Check Error Processing: Program 
check errors that occur while executing code within a 
virtual machine are reported to the virtual machine via 
a program check virtual interrupt. Program check er- 
rors that occur while executing within the VRM result 
in an abnormal system termination. VRM SVCs execute 35 
within the VRM and perform functions on behalf of a 
virtual machine. Therefore, the program check handler 
looks at a value in low memory to determine if errors 
that occur within VRM SVC code are to be reported to 
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invention. The first data structure is map node 70, 
which is dynamically created when a file is to be 
mapped and the mapped file page structure 71, which 
resembles the general format of an External Page Table 
(XPT), discussed earlier. 

The map node 70 as shown in FIG. 6, includes four 
fields designated 72-75. Field 72 is designated the seg- 
ment ID and functions to store the segment identifier 
that is to be used to store the mapped file. The field 
designation 73 is the map count field which functions to 
keep track of the number of users who have concur- 
rently requested that the file be mapped, other than 
copy -on -write type of mapping. Field 74 of map node 
70 is designated the CW segment ID or the copy_on_. 
write segment ID which identifies the unique segment 
ID that is used exclusively for the copy- on -write 
segment Field 75 is the copy- on -write map count 
field which functions to keep track of the number of 
users who are sharing this copy -on -write segment. 

The data structure 80 is a specific section of the seg* 
ment table used to store the segment ID segments that 
are being shared by more than one user. 

The mapped file page structure 71 is similar to an 
XPT, in that it includes an entry for each page of the file 
that is mapped. Entries, as shown in FIG. 6, include a 
protection field 81, a page status field 82, and a disk 
address field 83. The mapped file page structure is allo- 
cated from the XPT pool 86, shown diagramatically in 
FIG. 6. 

The dotted line block label 90 represents virtual 
memory. Segments of the memory addressable by the 
segment registers are designated 91, while the page of a 
segment is designated by reference character 92. 

Block 95 represents a process running in the system. 
Block 96 represents a list of segment identifiers for 
segments associated with the running process. These 
IDs are loaded into appropriate segment registers when 
the process "n" has its turn on the system. The 32 bit 
effective address is converted to a 40 bit virtual address 



30 



the virtual machine as a program check virtual interrupt 40 consisting of a 12 bit segment identifier and a 28 bit 



with the old IAR specifying the SVC or if the errors are 
an abnormal system termination. 

Selected VMM SVCs use this facility to save path 
length by not checking for error conditions when ac- 
cessing parameter lists. The program check handler 45 
performs the error recovery for them. 

Virtual Memory Manager Services: All Virtual 
Memory Manager services execute synchronously with 
respect to the caller. Several of these services can result 
in page faults in which case the process of the caller is 
forced to wait for the page fault to be resolved. 

Asynchronous vs. Synchronous Page Faults: The 
VRM supports both synchronous and asynchronous 
page fault processing for virtual machines. With syn- 
chronous page fault processing, the entire virtual ma- 55 
chine is forced to wait until the page fault is resolved. 
With asynchronous page fault processing, the virtual 
machine is allowed to dispatch other tasks. Only the 
faulting task is forced to wait until the page fault is 
resolved. Because of this, any operation that results in 60 
synchronous page fault processing will tend to lower 
the number of concurrent tasks that can be executed 
while any operation that results in asynchronous page 
fault processing will tend to maximize the number of 
concurrent tasks that can be executed. 

FIG. 6 illustrates two additional data structures that 
are uniquely associated with the Map Page Range ser- 
vices which incorporates the method of the present 



offset, as explained earlier in the application. The 12 bit 
segment ID is provided by one of the 1 6 bit segment 
registers that was selected by the 4 high order bits of the 
32 bit effective address. 

The VRM includes a fault handler which includes 
two separate functions represented by blocks 97 and 98, 
respectively, in FIG. 6. Block 97 functions to address 
system memory and provide the block 98 with a page 
fault interrupt when the requested page is not in main 
50 memory. Block 98 functions to resolve the page fault 
through access to the mapped file page structure since it 
contains the disk address in field 83, as described earlier. 

FIG. la is a flow chart, illustrating the major steps 
involved in the system when an application process "n 1 * 
performs various UNIX type System Calls. In block 100 
the first system call is to open a file. Block 101 performs 
the open file operation* The file may be opened as a 
read/write file, read only file, or a write only file. Block 
102 functions to read the inode into main memory from 
a directory reference 103 that is kept by the UNIX file 
management system. 

Assuming that the file has been opened, the next 
system call is a SHMAT (2) read only call to block 104. 
Block 105 determines if the file is currently mapped by 
reference to the segment table. If the segment is not 
currently mapped, a test is made in block 106 to see if 
the segment has been created. If the segment has not 
been created, block 107 creates the segment. Block 108 



65 
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functions to increment the referenced count in the seg- above described operations are normal conventional 

ment count field. Block 109 functions to get the segment UNIX functions for creating and opening a file. 

ID, while block 110 loads the segment register. If the After the file is opened, process A issues a SHMAT 

segment is currently mapped, as determined by block Read/Write call to write to file X. This call results in 

111, the addressing operation for the read system call is 5 the Create segment operation which assigns a segment 

complete. If the file is not currently mapped, a Map ID to the file and protects all pages "read only" in block 

Page Range service call is initiated to map the range of 206. No pages have been allocated at this point in the 

pages that are active in the system, as indicated by the process, but the segment boundary has been established 

file's External Page Table. Block 112 functions to create and the map node data structure updated in block 207. 

the map node 70 and the mapped file page structure 71, 10 Process A is re-dispatched in block 208. 

shown in FIG 6 At some subsequent time, process A issues a Read/- 

The actual read operation, represented by block 120 W rf ^ 0 ' l ? ad/st0 5? ^ration m block 209. specifying a 

checks to sec if the file is mapped, as shown in block virtual address which » checked in block 210 to insure 

121. If the file is not mapped, block 122 does a standard * » W J*™ the current segment boundaries. It will 

read operation. A similaToperation is done for a write " be assumed that the address is within the current bound- 

operation bv block 123 arv ' but 8mce were Piously protected in 

^rfta *£ or write operation when the file is W ~"!* * P rot ~ tion in "~ k 211 is 

mapped, block 124 converts the file offset and length ™* ™™ VlfVnSS T F™ 

parameters of the UNIX System Call parameters to a „ *> ^^^^^^^^a^ t 
j «■ * _ . - m MM L, i« 20 represented by block 212. A check is then made in block 

segment and offset parameter. Block 125 get the seg- P ^ address u not ^ d tne maxj . 

ment register ID from the shared segment table for the ^ then 

I/O operation if the system call is for a copy on wn e f^ Qn b tef £ inated , u \ ndl J ed fa bIock 214 , If 

operation or a read/wnte operation. Block 126 tests to ^ ^ ^ excecd the maximum pcrmissiblc 

see if a write operation is m vol vedjmd if so, to alloca e 2J ^ M5 es {he n% sizg fa Qf 64K 

a new block on the disk file in Mock 127. If a write tQ ^ desifcd si2c BIock 21fi alIocatcs pagcs 

operation is not involved, block 127 is bypassed and ^ ^ me and Wock 21? ^ al| es 

block 128 does a copy between the disk and main mem- ^ * nd the ucstcd addrcss u to thc ncw bound ary. 

ory. A block 129 then re-dispatches the process^ ^ Qcess b thcn rc<lis p atchedt M i ndica ted in block 

FIG. 76 illustrates Process A performing a SHMAT ^ 2 18 

read/write system call, as indicated by block 130. Block BIocks 1M2 00^^ in f unc tion to blocks 

131 tests to see if the file is currently mapped for a 211-214, but are illustrated in FIG. 8 to clarify the 

read/wnte operation. If not, block 132 tests to see if the p TO ces$ flow. Block 223 checks to see if the requested 

segment exists. If the segment does not exist, block 133 page ^ m rea , mcmory . if not> block 224 allocates a page 

creates a memory segment for the mapped file, while 35 in main mcm ory. If the page is not in real memory, a 

block 134 and 133 get and load the segment register "write new" supervisory call is issued by the system in 

with the segment ID for the mapped file. Block 136 tests block 225, which allocates a new disk block from the 

to see if the file is mapped and, if so, the function is f ree ^ represented in block 226. The process is 

complete. If the file is not currently mapped read/write, re-dispatched in block 228, after the map range page 

the Map Page Range service block 137 performs a page 40 segment is extended in block 227. 

mapping to create the data structures 70 and 71 of FIG. while the invention has been described and illus- 

6. trated with respect to a specific embodiment, it will be 

The major steps performed by the Map Page Range appreciated by those persons skilled in the art, that 

service block 112 or 137 are illustrated in FIG. 7c. After changes and modifications may be made without de- 

a segment has been created the file must be mapped into 45 parting from thc spirit of the invention and the scope of 

the segment. This is a dynamic operation, since the the appended claims, 

primary storage allocation is virtual, and the segment We claim: 

assignment is transient. As illustrated in FIG. 7c the 1. i n a page segmented virtual memory data process- 

inode structure 181 is read for the block address of each j n g system comprising, 

page to be allocated for the file. Each group of contigu- 50 (A) a main memory, 

ously allocated blocks is summed, and the count re- (B) a secondary storage device having a plurality of 

corded in the field adjacent to the starting block number locations each of which stores a block of data com- 

2 entry in the Map Page Range structure. Discontigu- prising at least one virtual page, each of said loca- 

ous blocks are reflected in discrete entries in the Map tions having a different block address, 

Page Range structure. When the entire file inode struc- 55 (Q a UNIX type operating system (UOS), 

ture has been scanned, the Map Page Range SVC is (D) an application program for processing UNIX 

issued and the external page table slot entries for the files, each said file being stored on said device at 

appropriate segment are updated with the block ad- ( identified block addresses, 

dresses for each page of the file. (£) a memory manager for managing: 

FIG. 8 describes and illustrates the extending of a 60 (1) the allocation of virtual address space in said 

segment for a mapped file. The operation is as follows. system, and 

Block 200 represents an application program being (2) the transfer of information between said second- 
executed on the system that is to create and extend a ary storage device and said main memory, said 
new file. Block 201 represents the application of block memory manager including, 
200, executing a specific process which involves Creat- 65 (3) an external page table (XPT) including a plural- 
ing a new file as represented in block 202. Subsequently, ity of XPT entries for correlating each said block 
process A issues an Open System Call in block 203 to address to at least one virtual page address, each 
open file X in a Read/Write mode or a Write mode. The said entry including a status field for indicating 
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to said system the status or the corresponding 
virtual system, and 

(F) means for selectively mapping said files to desig- 
nated segments of the virtual address space of said 
virtual memory, 5 

a method to automatically manage the size of each 
said designated segment through allocation of in- 
crements of XPT entries in accordance with the 
addresses in the instructions of said application 
program that are executed during said processing 10 
of a UNIX file, said method comprising the combi- 
nation of the following sequential steps, 

(1) establishing for said system with said operating 
system, an aJlocatable increment of sequential 
virtual addresses comprising a fixed plurality of 15 
XPT entries, each said increment having a vir- 
tual address that defines the Upper Boundary of 
said increment, 

(2) allocating, with said operating system, a number 
of said increments to provide at least one XPT 20 
entry for each block on said device that contains 
information associated with said file being pro- 
cessed resulting in the current size of said seg- 
ment being defined by said Upper Boundary of 
said last allocated segment, 25 

(3) establishing in each said XPT entry a status field 
for indicating to said system one of a plurality of 
processing states of the virtual page correspond- 
ing to said entry, 

(4) mapping said file to correlate the address of said 30 
each block to the virtual page address corre- 
sponding to the associated said one XPT entry, 
including the steps of; 

(a) setting the status field of each said associated 
XPT entry to a first value, and 35 

(b) setting the status fields of said XPT entries 
between the virtual address representing the 
End of File (EOF) and the virtual address 
representing the end of the current segment, to 

a second value to indicate allocated, un- 40 
mapped XPT entries, 

(5) interrupting the current instruction being exe- 
cuted in said application program when said 
current instruction involves a virtual address 
contained on an unmapped virtual page to indi- 45 
cate to said system that one of a plurality of 
predetermined protection exceptions has oc- 
curred, 

(6) automatically allocating with said operating 
system sufficient said increments when a first of 50 
said predetermined exceptions has been indi- 
cated, to accommodate said virtual address in- 
volved with said current instruction, 

(7) automatically mapping with said operating sys- 
tem at least one newly allocated disk block to 55 
one of said unmapped XPT entries when a sec- 
ond one of said predetermined exceptions has 
been indicated, and 

(I) reissuing said current instruction that was inter- 
rupted. 60 
2. The method recited in claim 1 in which said main 
memory includes a plurality of page frames, each of 
which stores one virtual page comprising a plurality of 
sequential byte locations, each of which has a different 
virtual address, further including the step of, 65 
(A) determining with said memory manager that said 
current instruction is attempting to write to a vir- 
tual address location that is beyond said EOF ad- 



dress, and that the XPT entry corresponding to the 
virtual page containing said virtual address has 
been allocated but has not been mapped to cause 
said current instruction to be interrupted with said 
step of interrupting and said second one of said 
plurality of predetermined protection exceptions to 
be indicated to said system. 

3. The method recited in claim 2 in which said mem- 
ory manager includes a page faulting mechanism includ- 
ing means to convert a virtual address associated with 
said instruction to the storage location of the corre- 
sponding XPT entry only when said virtual address is 
contained on a virtual page that has been mapped, and 
in which said step of determining includes the further 
steps of, 

(A) invoking said page faulting mechanism in re- 
sponse to not locating said virtual address of said 
current instruction in said main memory, 

(B) converting said virtual address to the said loca- 
tion of said corresponding XPT entry, and 

(C) reading the status field of said corresponding 
XPT entry to determine that said XPT entry has 
been allocated but has not been mapped. 

4. The method recited in claim 3 in which said mem- 
ory manager further includes an Inverted Page Table 
having a separate entry for each page frame in said main 
memory for correlating virtual page addresses to page 
frame addresses in said main memory, including the 
further step of, 

(A) allocating a new page frame in main memory, and 

(B) protecting the corresponding said IPT entry to 
permit only read operations on said page frame 
prior to reissuing said interrupted instruction. 

5. The method recited in claim 4 including the further 
steps of, 

(A) interrupting said reissued instruction when said 
memory manager determines that said new virtual 
page in said new page frame is protected read only, 

(B) allocating a new block location on said device for 
storing said new virtual page, 

(C) mapping the address of said new block to said 
corresponding unmapped XPT entry, 

(D) updating said protected IPT and XPT entries to 
permit said new virtual page to be written into, and 

(E) reissuing said interrupted instruction a second 
time to write into said new virtual page. 

6. The method recited in claim 1 in which said mem* 
ory manager further includes an Inverted Page Table 
(IPT) having an IPT entry for each said page frame for 
correlating the address of said page frame to the virtual 
address of the virtual page stored in said page frame, 
and said step of indicating includes the further step of, 

(A) determining with said memory manager that an 
instruction is attempting to write into a virtual 
address location that is beyond said Upper Bound- 
ary so said first one of said plurality of predeter- 
mined protection exceptions is indicated to said 
system. 

7. The method recited in claim 6 in which said step of 
determining includes the further steps of, 

(A) checking said IPT with said memory manager to 
determine if a page fault has occurred because no 
page frame in main memory is storing a virtual 
page containing said protected virtual address, 

(B) checking said XPT with said memory manager 
when said step of checking said IPT indicates that 
a page fault has occurred, to determine if an allo- 
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cated XPT entry has an associated virtual page that 
contains said protected virtual address, and 
(C) indicating to said system that said protected ad- 
dress is beyond said Upper Boundary if said steps 
of checking do not find an XPT entry having a 5 
virtual page containing said protected address, 

8. The method recited in claim 7 in which said mem- 
ory manager further includes a page fault handling 
mechanism for resolving page faults that occur in said 
system/ including means to convert a virtual address 
associated with said instruction being executed to the 
storage location of the corresponding XPT entry only 
when said virtual address is contained on a virtual page 
that has been mapped, and in which said steps of check- tJ 
ing include the further steps of, 

(A) invoking said page faulting mechanism after said 
first checking step in response to said page fault, 

(8) converting said virtual address to a table address 
where the address of a corresponding XPT would 20 
be located if an XPT entry had been allocated for 
that virtual address, and 

(C) invoking said step of indicating after concluding 
that an XPT entry has not been allocated for said 
virtual address that caused said page fault when 25 
said virtual address is not located at said table ad- 
dress. 

9. The method recited in claim S in which said step of 
converting includes the further steps of, 

30 



(A) hashing said virtual address to provide said table 
address, and 

(B) storing at said table address the addresses in a 
chained fashion of all XPT entries whose corre- 
sponding virtual pages hash to that table address. 

10. The method recited in claim 9 in which said step 
of automatically allocating includes the further steps of, 

(A) increasing the size of said segment to accommo- 
date the faulting virtual address by allocating addi- 
tional increments of XPT entries sufficient to estab- 
lish a new Upper Boundary beyond said faulting 
virtual address in response to an indication by said- 
step of indicating that a first said protection excep- 
tion has been indicated. 

11. The method recited in claim 10 further including 
the step of, 

(A) interrupting the reissued instruction to indicate to 
said system that a second one of said predetermined 
protection exceptions has occurred because the 
value of the status field of the XPT entry associated 
with said faulting address indicates that said entry 
has not been mapped, and 

(B) issuing said instruction a third time after a page 
frame in main memory has been allocated, a disk 
block has been allocated, and a new EOF has been 
established and the appropriate entries in said XPT 
and IPT have been updated to correct said protec- 
tion exception. 
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